Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery
نویسندگان
چکیده
Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the state-space explosion, we introduce “tabular linear functions” that generalize tile-coding and linear value functions. Action space complexity is reduced by replacing complete joint action space search with a form of hill climbing. To deal with high stochasticity, we introduce a new algorithm called ASH-learning, which is an afterstate version of H-Learning. Our extensions make it practical to apply reinforcement learning to a domain of product delivery an optimization problem that combines inventory control and vehicle routing.
منابع مشابه
Scaling Average-reward Reinforcement Learning for Product Delivery
Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state space and action space, and high stochasticity. We give partial solutions to each of these curses that provide order-of-magnitude speedups in execution time over standard approaches. We demonstrate our methods in the domain of product delivery. We present experimental results on refinem...
متن کاملA Reinforcement Learning Approach for Product Delivery by Multiple Vehicles
Real-time delivery of products in the context of stochastic demands and multiple vehicles is a difficult problem, as it requires the joint investigation of the problems in inventory control and vehicle routing. We model this problem in the framework of Average-reward Reinforcement Learning (ARL) and present experimental results on a modelbased ARL algorithm called H-Learning with piecewise line...
متن کاملReinforcement learning based feedback control of tumor growth by limiting maximum chemo-drug dose using fuzzy logic
In this paper, a model-free reinforcement learning-based controller is designed to extract a treatment protocol because the design of a model-based controller is complex due to the highly nonlinear dynamics of cancer. The Q-learning algorithm is used to develop an optimal controller for cancer chemotherapy drug dosing. In the Q-learning algorithm, each entry of the Q-table is updated using data...
متن کاملScaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function
Almost all the work in Average-reward Reinforcement Learning (ARL) so far has fo-cused on table-based methods which do not scale to domains with large state spaces. In this paper, we propose two extensions to a model-based ARL method called H-learning to address the scale-up problem. We extend H-learning to learn action models and reward functions in the form of Bayesian networks, and approxima...
متن کاملContinuous-Time Hierarchical Reinforcement Learning
Hierarchical reinforcement learning (RL) is a general framework which studies how to exploit the structure of actions and tasks to accelerate policy learning in large domains. Prior work in hierarchical RL, such as the MAXQ method, has been limited to the discrete-time discounted reward semiMarkov decision process (SMDP) model. This paper generalizes the MAXQ method to continuous-time discounte...
متن کامل